9 research outputs found

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

    Full text link
    Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.Comment: Accepted for publication by the International Journal of Computer Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14 hand tracking paper with several extensions, additional experiments and detail

    Ghost Detection and Removal for High Dynamic Range Images: Recent Advances

    No full text
    23 pagesInternational audienceHigh dynamic range (HDR) image generation and display technologies are becoming increasingly popular in various applications. A standard and commonly used approach to obtain an HDR image is the multiple exposures fusion technique which consists of combining multiple images of the same scene with varying exposure times. However, if the scene is not static during the sequence acquisition, moving objects manifest themselves as ghosting artefacts in the final HDR image. Detecting and removing ghosting artefacts is an important issue for automatically generating HDR images of dynamic scenes. The aim of this paper is to provide an up-to-date review of the recently proposed methods for ghost-free HDR image generation. Moreover, a classification and comparison of the reviewed methods is reported to serve as a useful guide for future research on this topic

    An SVD-Based Approach for Ghost Detection and Removal in High Dynamic Range Images

    No full text
    International audienceIn this paper, we propose a simple method for the ghost detection problem in the context of merging multiple low dynamic range (LDR) images to form a high dynamic range (HDR) image. We show that the second biggest singular values extracted over local spatio-temporal neighbourhoods can be effectively used for ghost region detection. Furthermore, we combine the proposed method with an exposure fusion technique to generate final HDR image free of ghosting artefacts. We present experimental results to illustrate the efficiency of the proposed method and quantitative comparison with other existing approaches show the good performance of our method in detecting and removing ghosting artefacts

    A Shape-based Statistical Method to Retrieve 2D TRUS-MR Slice Correspondence for Prostate Biopsy

    No full text
    International audienceThis paper presents a method based on shape-context and statistical measures to match interventional 2D Trans Rectal Ultrasound (TRUS) slice during prostate biopsy to a 2D Magnetic Resonance (MR) slice of a pre-acquired prostate volume. Accurate biopsy tissue sampling requires translation of the MR slice information on the TRUS guided biopsy slice. However, this translation or fusion requires the knowledge of the spatial position of the TRUS slice and this is only possible with the use of an electro-magnetic (EM) tracker attached to the TRUS probe. Since, the use of EM tracker is not common in clinical practice and 3D TRUS is not used during biopsy, we propose to perform an analysis based on shape and information theory to reach close enough to the actual MR slice as validated by experts. The Bhattacharyya distance is used to find point correspondences between shape-context representations of the prostate contours. Thereafter, Chi-square distance is used to find out those MR slices where the prostates closely match with that of the TRUS slice. Normalized Mutual Information (NMI) values of the TRUS slice with each of the axial MR slices are computed after rigid alignment and consecutively a strategic elimination based on a set of rules between the Chi-square distances and the NMI leads to the required MR slice. We validated our method for TRUS axial slices of 15 patients, of which 11 results matched at least one experts validation and the remaining 4 are at most one slice away from the expert validations
    corecore